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The Problem 



V.'^t . 

er|c 



In e3q>loratory research, regression analysis Is often applied to 

nonexperlmentaljinultlvarlfid>le data In an attempt to reveal relationships 

between a dependent variable and a set of regressor variables. Conceptually, 

the variation in the dependent variable may be separated Into three parts , 

(1) that which can be attributed to the regressor variables Individually, 

(li) that which can be attributed to the regressor variables as a group 

(ill) the residual variation which is unexplained by the regression. 

Parts (1) and (11) together comprise the total variation explained by 

the regression. Since part (1) is composed of the so-called unlcjue sums 

1 / 

ox 8<]uare8^ It will be referred to as the unique portion of the unexplained 

variatlono The purpose- of thi.3 note Is to analyze the nonunique portion 

of explained variation (l.e. part (11)), a subject which has apparently 

2 / 

not received much attention heretofore."" ^ The motivation for doing so Is 
the promise of a more complete understanding of the relationship of the 
dependent variable and the regressor variables. Since the procedure for 
partitioning the nonunique part may be regarded as an extension of a nsethod 
for obtaining the unique sums of squares we will begin with a discussion 
of the latter. 



1 / Also called the extra or added sums of squares. 

D'lrlng preparation of this note, two articles were discovered which 
present essentially the same results as. this paper but with another 
approach and somewhat different motivation. See R.G, Newton and D.J. 
Spurrell, ”A Development of Multiple Regression for the Analysis of 
Routine Data," Applied Statistics. Vol. 16, No. 1, 1967 and "Examples 
of the Use of Elements for Clarifying Regression Analysis," Applied 
Statistics . Vol. 16, No. 2, 1967. 







One measure of the variation in the dependent variable is the 



sum of squares of deviations of the observations from the mean. More 

^-t #*ai i.r 1 a«- V Aonnto fVio on ' th& denendeiit 

9 w — - K- - 



variable and let 5 be the mean of m observations. Then 
S(Y) - (\ - 



5 )^ 



is the sum of s<|uares of the deviatio^iSo 

If a regression analysis is carried out using, g'ay, three regressor 
variables the result is an equation of the form 

Y . * hS 

Given a set of observations, X^2 estimate 

the corresponding value of the dependent variable Y^. The amount of 

variation which is explained by the regression on X2 and is 

-2 



s(Wa> “ S 



i«l 

and the proportion of the total variation which has been accounted for 



by the regression is 



s<Wa) 



S(Y) 



which is the square of the multiple correlation coefficient. The residual 

variation which is unaccounted for by the regression will be denoted as 

m <y 

- V 

i=i 

The procedure for obtaining the unique contribution of, say, X^^ 
to the sum of squares is to regress Y against X2 and X^. This yields 
S(X2X^) , the variation accounted for by the variables X2 and X^. 





3 



The unique sum of squares associated with is now defined to be 

y(xp * scx^x^x^) - s(X2X^) 

^(X^) represents the additional explained sum of squares when 
Xj^ is added tc the regresaiou last, (In actual computation 7iXj^) can 
be computed in another way so that it is not necessary to run Wo 
complete regressions.) 

Using Table 1 we may now look again at our vay of classifying 
the total variation. At this stage of the development the nonunique 
portion is shown simply as the difference between the total variation 
and the other two known components. Before proceeding to the analysis 
some interpretive remarks may be helpful. 



Table 1 



Classification 
of the Variation 


Sum of 
Squares 


Unique Portion 
Nonunique Portion 
Residual 
Total 


S(Y) - y(Xj) - yifXj) - yCXj) 

? ( 

S(Y) 



The unique contributions of the variables are often calculated 
because they provide an Indication of the relative importance of the several 
regressor variables in explaining the dependent variable. H^ey are potentially 


















misleading h(»7ever because they neglect the nonunique portion of explained 
variation. Looking at the origin of this variation will Illuminate the 
problem. 

To examine the nonunique variation more deeply requires the notion 
of orthogonality among the data vectors, a concept developed In some 
detail In books such as Goldberger^ or Draper and Smith~^. Briefly, it 
can be explained as follows. Let and X ^ now be column vectors of 

observations on the i^*' and j™ variables expressed as deviations from 
the mean. If there are m observations these will be m-dlmenslonal data 
vectors. Two such vectors are said to be orthogonal if Xj^Xj « 0, that 
is if the vectors are at right angles. to one another in m~space. In 
general, the cosine of the angle between the vectors is equal to the 
s^ple correlation coefficient between the two variables. If a set of 
data vectors are mutually orthogonal thep they are uncorrelated with one 
another and the unique sums of squares will add up to the total explained 
sum of squares . In other words , each regressor variable is bringing sew 
information to the regression. 



Z/ A.S. Goldberger, Ecoaometric Theory . (New York! John Wiley, 1964). 
j4/ N.R. Draper and H.Smiwh, A pplied Regression Analysis . (New York: 
John Wiley, I9Sj>). 
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If the regressors are to some extent redundant, however, we have 

a departure from orthogonality. When the departure is considerable 

this condition is known as multi colllnearity and is due to high inter- 

5/ 

correlations among the regressor variables. The nonunlque portion of 
explained variation is then nonzero and the problem of determining the 
contributions of individual variables is inherently ambiguous. The next 
section, however, describes a way of partitioning the nonunique variation 
which can be an aid to interpreting the results of regression analyses. 

The Three Variable Case 

Pafinltion of Commonalities 

For ease of exposition, the following discussion of the partitioning 

procedure will be restricted to three independent variables. The equatioxus 

required for the n-variable case are given in the Appendix. 

First, recall that the unique contribution of was defined as the 

sum of squares with all three variables less the sum of squares associated 

with the regression on X- and X«, i.e. 

Z ^ 

« scxj^x^x^) » s(X2X^) 
cr in general 

(1) y, (X. ) - S(X.X.X ) - S(X ) ; 

1 i i j k j k 

Here and subsequently in the three variable case i « 1»2,3 and tffcj^k* 



5^/ In the extreme multicollinearlty case, a data vector can be written 
as an exact linear combination of other data vectors. This condition 
makes regression analysis impossible because matrix Inversion cannot 
be carried out. 




a: 
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This notion has a direct extension in the sense that S(X.X X ) - S(X ) 

i j k k 

is a measure of the effect of adding the variables and X to the 

j 



regression Isst. Ve may 0^rn^0oo this wsritin^ 

(2) s{x^x^Xj^) - S(X^) - ri(Xj) + v^(x^) + 

*3 

where that part of the difference in the sum of squares 

which may be associated with X, or X • It may be regarded as that part 

^ j 

attributable to and X^ in common, or for short, the commonality of 
Xj^Xj. In particular since there are two variables Involved it will be 
referred to as a second-order commonality. 

Rearranging equation ( 2 ) provides us with a definition of second 
order commonalities, viz. 



( 3 ) ■^2(XiX^) = S(X^XjX^) - S(x^) - - y^(x ) 

The definition is recursive in that the unique sums of squares have been 
defined by equation (1). 



In analogy with equation (2) we may write the total sum of squares 
attributable to a three variable regression as 

(4) s(XjX^Xj^) . X^(X^) + Vj^cx^) + + r^cx^x^) + r^cx^x^) 

Then rearranging equation (4) we have the definition of the third-order 
conanonality. 

(5) V3(XjXjX|^) - SCXjXjX^) - r^CX^) - Vj^CX^) - V^cx^) - y2(XiX3) 



hw - 





4 ' 





r. 



? 







; 1 

J 
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Reduced Form of the Commonalities 

Collecting the three equations defining coimonallties we have 
<1) r,(Xj) - S(X^X,X,) - S(X^X,); 

- ~ -- J A, J iC 

(3) 

(5) - S(x^x^x^) - rj(xp - rj(x^) - 

7"2<W - ’'2«jV 

The commonality terms on the rl^t side of this set of recursive 
equations can now be eliminated yielding 

(6) X(X.) = S(X.X,X ) - S(X X.) 

J- ^ 1 j k j k 

(7) 72(X^X^)= -S(XjXjXj^) + S(X^Xj^) + s(x^:^) - S(X^) 

(3) 73(X^XjX^) = S(X^X^X|^) - S(XjXj) - S(X^X^) - S(X^X^) + S(X^) 

+ S(X ) + S(Xj^) 

The foregoing way of expressing the comuonalities will be referred 
to as the reduced form. 

Dividing (6) , (7) and ^8) by S(Y) gives alternative forms which 
will be called nomonality "oef flcients . 

(9) U(Xj^) = y3(Xj^)/S(Y) = R^(Xj^X^X^) - E^(X^X ); 

(10) C(XjX^) = y2(X^X^)/S0f) = -R^(X^X^Xj^) + R^X Xj^) tR^XjX^) - R^Xj^); 

(11) C(X^X3Xj^) = r3(X^X3X|_,)/S(Y) =. r2(XiXj3^) - r2(X^Xj) - R^(XjX^) 

- R^(X3X|^) + R^(X^) + R^(X.) + R^(K^) 
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In this way the part of the total variation assodated with a' 

combination of variables can be calculated from the appropriate multiple 
correlation coefficients. 

Partitioning the Sums of Squares 

Using the cosaaonallty definitions in equations (1). (3) and (5). 

the sum of squares attributable to a regression can be written in an 

interesting form. Taking equations (3) and (5) and solving for . 

S(Xj^) gives 

(12) S(y . yjCy + ^ ^3 VjV’ 

solving equations (1) and (5) for S(X.Xj^) gives 

«3) SB^y . . ,^,y . ^ 

•'3<VjV 

and solving equation (5) for S(X^X^2^) gives 

(14) S(XiXjX^) , y,(x^) + ^ ^ 

+ V2(X^Xj^) + l3(X^X^Xj^). 

In each equation the sum of squares due to the regression is equal 
to the sum of all unique sums of squares and commonalities associated with 
the regressor variables. The meaning of the commonalities is now clear 
in terms of equation (12). For example. i® that part of tb;e 

aum of squares which is common to S(X^) and S(Xj^)and no other; 13 (x^X^x^) is 
common to S(Xj^), and S(X|^) and no other; etc. 



- , '' * .-S' -7- 

■' '■ '■ ■' • A vV-r-’’-' ■■ ■' ■ ' 

■ , -• V,v.-':v,,;7- • '7'"' ■ 7' 






GltdilP of Variables 

The equations presented can easily be esc^ended to apply to groups 
of regressor variables. For exasq>le, let F represent the set of variables 
; let G represent the set X^,X^ and let H represent Xg,X^ and Xg. 

Then 

R^(F) - R^(X;^X2X3) 

R^(G) = R^(X^Xj) 

R^(H) - R^(XgXjXg) 

R^(FG) = R^CXj^X^X^X^X ) 
and so on. 

The commonality coefficient for. two sets of variables, say F and 
G, is then 

C(FG) « R^(FGH) - R^(H) - C(F) - C(G) 
by extension of equation (3) and other commonality coefficients follow 

directly. Note that interpretation of the coefficients must be modified 

\ 

slightly now; for example, a unique contribution may refer to the variation 
explained by a group of variables rather than a single one. The grouping 
device will be used in the example which follows. 

An Exaa\ple 

To Illustrate the use of commonalities we shall use some data 

6 / 

collected as part of the Educational Opportunities Survey (EOS) 

Coleman, J.S., et al. , Equality of Educational Opportunity . U.S. 
Department of Health, Education and Welfare; National Center for 
Educational Statistics, (OE-38001). Washington, D.C. 1966, U.S. 
Government Printing Office Catalog No. FS 5.238:38001. 




The depenUent variable is an index of achievement developed from the 

2 / 

EOS data. A large number of regressor variables were separated into 
four groups as follows: student background variables (B), teacher 

variables (T) , school program variables (P) and sdiool facilities 
variables (F) . Since this note is primarily concerned with methodology 
we need not go deeper into the nature of these variables; readers more 
interested in the content should refer to Mayeske, et al.~^ 

Table 2 displays the set of commonality coefficients for the 



dvata. For second and higher orders a table entry is made for each 
variable with which a coefficient is associated. 



2/ Mayeske, G.W. and F.D.Weinfeld, Factor Analysis of Achig>voTnPT^^ 
Measures From th e Educational Opportunities Survey . Division of 
Operations Analysis, Technical Note No. 21, January 18, 1967. 

Mayeske, G.W. , F.D.Weinfeld, A. £. Beaton, Jr., and J.M.Proshek, 
Correlational and Regression Analysis of Differences Between tbA 
AcMevement Level s of Ninth Grade Schools from the Educational 

^portunities Survey . Division of Operations Analysis, Unpublished 
manuscript. 
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T^le 2 

Commonality Coefficients 






V 



IX 



Commonality 


Sets 


of Reeressor Variables 


Coefficients 


B 


T 


P 


F 


UB) 


•1061 






' 


U(T) 

First 

Order u<P) 




.0167 


!o125 




U(F) 








i0038 


C(BT) 


.4891 


;4891 






C(BP) 

Second 
Order C(BF) 


.0137 
• -OOOA 




.0137 


.0004 


C(TP) 




*0066 


i0066 




C(TF) 




-.0009 




-.0009 


C(PF) 






.0050 


.0050 


C(BTP) 


.1197 


.1197 


.1197 




C(BTF) 


.0304 


-.0304 




.0304 


Third C(BPF) 
Order 

C(TPF) 


.0052 


.0018 


.0052 

.0018 


.0052 

.0018 


Fourth C(BTPF) 
Order 


.0561 


.0561 


.0561 


.0561 


R^for a 
j Single Set of 
1 Variables 


1 

.8207 


^7195 


.2206 


.1018 



■ - ’ /; 
^ i 

1 . ‘ y 
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The last row of the table sums the coefficients for each column thus 
giving the square of the multiple correlation coefficient for each 
individual variable as suggested by equation (12). The sum of all 
coefficients is R^(BT?F) - .8662. 

This set of results shows how knowledge of the higher order 
commonalities can provide additional insli^t into the relationship 
among the variables. Looking at only the unique contributions of 
the variables suggests that the student background varidsles with 
U(B) * .1061 outweigh the others in explaining achievement. How- 

ever, the second order coefficient between student background and 
teacher variables is .4891, the largest of all coefficients. This means 
that when these two sets of variables are added to the regression last, 
the reduction in unexplained variance is substantially more than can 
be attributed to each set individually on the basis of first order 
coefficients. Though we are not able to further separate this joint 
contribution we are at least warned that the effect of the teacher 
variables may be much greater than was Indicated by the first order 
coefficients. 

The joint effevcts also carry through to higher order commonality 
coefficients. Thus C(BTP) is .1197, the second largest coefficient, and 
C(BTPF) is .0561 the fourth largest. Consequently, one has good reason 






o . 















-:fJ^ > : 
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X3 

to guard against rejecting school variables In general and especially 

teacher variables as determinants of achievement. Resolutlpn of the 

asablgulty Is another matter however for it is generally agreed that 

correcting the effects of multicolllnearity requires the acquisition of 
9/ 

new data.*" 

The perhaps unexpected result that commonalities can be negative 
la evident from Table 2. Re-examination of their development shows 
that, unlike the unique sums of squares, they are not constrained from 
being negative. Exactly how negative commonalities should be interpreted 
Is, at this time, an open question though the previously cited reference by 

Newton and Spurrell offers a geometric explanation for their occurence. 
Acknowledgement 
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£/ See for example, J. Johnston, Econometric Methods. V/^r^ti- 
McGraw-Hill, 1963, p.207). ^ v«ew lorK 



o 

ERIC, 



x. 



14 




APPENDIX 



Partitioning in the n-varlable Case 

Writing out expressions for commonalities in the n~variable case 
requires some set notation to keep track of the variables. We vlll use 
the following: 



.n 






V' 

. j 
V 






The set of all variables X_,X«, ,X„. 

1' 2 n 

A subset of V® containing j members. 

j 

The complement of V . 

A stibset of containing 1 members. 

The set of all possible for a given ; 

j 

The nusiber of members In the set is (^) - 
The union of the sets and V^. 



j! 



iKj-i): 



For the three variable case the set memberships are given in Table Al. 
The commonality definitions may be written as 

y^(v^) = s(v“) - s(v^) 



>2(v^) 



S(v”) - S(V^) -Y, l'i(Vi) 

Ml 



1'3<V^) = S(v") - S(V^) - 2 - E 



{H\ 









S(v”) 



S(v'') - 2 



M-il 



r (V, ) 

k-1 lt-1 



V2<2> - 



E 

- E 

Ml 



(v") 



S(v") - E - S V2(vS-2> - • 

lvS-4 {vS-2{ 



. E Y 

|V5{ 



•NYjjfJI M.y-A.w l»V.. 



' r.. - - " * I ^ ~irr~^ 

. V 
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Set Symbo 


r 


Set Memberships ' 


vl 


X, 

* 

X2X3 


X. 

x,x : ' 


1 ' 
x«x ' 

2 3 


y2 

y2 

y2 

'^1 




V2 

^3 


!' 

=^2 ■ : 

; / 
' f 


¥3 : ', 

"1 


*1 

x,x.. . . 

=J 


^2 

XX 

1 


\ 

V2 


^^3.':: 

¥3- 

/ 


^2 

x_x 
1 2 


*3 . 

x X 
13 


3 

’2 

V 3 v| 


123 

einpty 






V2 

V2 

_i 


V3 

Y3 

-l_l 


V3 

¥3 

"3 

J 



Table A1 



The number of order coiranonalities is nl/k!(n~k)! and 
consequently the total number of commonalities is 2^-1^ 

The reduced forms may be written as 
V](V^) = s(v”) - S(V^) 

= -s(v“) + 2 s(vV?) - s(y2) 

Hf 

= S(v") - 2; S(v 3 v|) + .5^ S(V^x) - S(V^) 

\(V>^) = (-l)‘'«[s(V") - Z S(v\ti> S(^t2) 

{'"k-lf *V2f 

-...(-l)‘^-^lEs(v’S^)l- S(vS 

jv^J -* 

7^(v") = (-l)'^l[s(v”) - E S(v" + Z S(V^2) 

^S-2f 

Es(v”)l 

fv;i 



An alternative way of keeping track of the variables for the 
reduced form is by a symbolic form for the argument. We give an 
example first and then write the general form. A second order 






> * ’^ , f - 















‘ .*r\ 






i<^-:p':.C-i\r '■*- 



'.V'^ ',':r ;VfV:H-<;'vV.-; 
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commonality In the three variable case may be written as 

r^Cv^) - r,(x^x^) - s [ -(i-x^) U-xpx^] 

— <a */ 

The meaning of the symbolic expression on the right Is that the product 
In the brackets Is first multiplied out and then the absolute value of 
each term becomes an argument for a sum of squares with the sign of 
the term carrying over as the sign on the sum of squares . That Is , 

y.(XiXj) . s [-a-x^)(i-Xj)x^] 

» st-X^+XjX^+XiX^-X^X^X^] 
and upon converting from the symbolic form we have 
(Al) y2(XiX^) = -sex^X^Xj^) + S(XjXj^) + S(X^Xj^) - S(Xj^). 

Equation (Al) Is the same as equation (7). 

Using the symbolic form we may write the commonalities which 
compose S(X^) as 
Vj^(Xj^) = S[-<l-Xj^)X2Xj...Xj 

r 2 (Xj^X 2 > » s[U 1 -Xj)( 1 -X 2 )X 3 X ...xj 

y (X X ...X ) = S[-^1-XJ(1-X«)...(1-X ^)x„] 

n-1 1 2 n-1 •- 1 ^ n-1 n-» 

r(x,x-...x ) - s[i-(i-x,)(i-xj...(i-x )] 
ni2n ** J- 2 n 



The other commonalities have analogous forms. 



